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Network-based approaches to modeling and simulating biochemical reaction systems, such as 
those based on ordinary differential equations, require enumerating all species and reactions that 
can potentially exist in a system. The applicability of these approaches is limited, however, by the 
problem of combinatorial complexity, an explosion in the number of species and reactions due to 
protein-protein interactions and post-translational modifications in biochemical systems. Rule-based 
modeling avoids this problem by representing molecules as structured objects and encoding their 
interactions as pattern-based rules. Rule-based models can, on the one hand, be used to generate 
fully-enumerated reaction networks or, alternatively, simulated directly using particle-based kinetic 
Monte Carlo methods. The latter "network-free" approach produces exact stochastic trajectories 
with a computational cost that is independent of network size. However, memory and run time 
costs increase linearly (at best) with the number of particles being simulated, limiting the size of 
system that can be feasibly handled. Here, we present a hybrid particle/population simulation 
method that combines the best attributes of both the network-based and network-free approaches. 
The method takes as input a rule-based model augmented with a user-specified subset of species 
to treat as population variables rather than as particles. The model is then transformed by a 
process of partial network expansion into a form that can be simulated using a population-adapted 
network- free simulator. We have implemented the transformation method within the open-source 
rule-based modeling platform BioNetGen and the resulting hybrid model can be simulated using 
the particle-based simulator NFsim. Benchmark tests show that significant memory savings can 
be achieved using the new approach and a monetary cost analysis provides a practical measure of 
its utility. We also consider the possibility of applying accelerated-stochastic simulation methods, 
such as r leaping, to the population subnetwork of the hybrid model in order to improve run time 
efficiency. 



AUTHOR SUMMARY 

Rule-based modeling is a modeling paradigm that ad- 
dresses the problem of combinatorial complexity in bio- 
chemical systems. The key idea is to specify only those 
components of a biological macromolecule that are di- 
rectly involved in a biochemical transformation. Until 
recently, this "pattern-based" approach greatly simplified 
the process of model building but did nothing to improve 
the performance of model simulation. This changed with 
the introduction of "network-free" simulation methods, 
which operate directly on the compressed rule set of a 
rule-based model rather than on a fully-enumerated set of 
reactions and species. However, these methods represent 
every molecule in a system as a particle, limiting their use 
to systems containing less than a few million molecules. 
Here, we describe an extension to the network-free ap- 
proach that treats rare, complex species as particles and 
plentiful, simple species as population variables, while 
retaining the exact dynamics of the model system. By 
making more efficient use of computational resources for 
species that do not require the level of detail of a particle 
representation, this hybrid particle/population approach 
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can simulate systems much larger than is possible using 
network-free methods and is an important step towards 
realizing the practical simulation of detailed, mechanistic 
models of entire cells. 



INTRODUCTION 
Rule-based modeling 

Cell signaling encompasses the collection of cellular 
processes that sample the extracellular environment, pro- 
cess and transmit that information to the interior of the 
cell, and regulate cellular responses. In a typical scenario, 
molecules outside of the cell bind to cognate receptors on 
the cell membrane, resulting in conformational change 
or clustering of receptors. A complex series of protein 
binding and biochemical events then occurs, ultimately 
leading to the activation or deactivation of proteins that 
regulate gene expression or other cellular processes. 

A typical signaling protein possesses multiple interac- 
tion sites with activities that can be modified by direct 
chemical modification and by the effects of modification 
or interaction at other sites. This complexity at the pro- 
tein level leads to combinatorial complexity at the level 
of signaling networks posing a major barrier to the 
development of detailed and comprehensive models us- 
ing standard approaches that require explicit enumera- 
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tion of all species and reactions in a network [lH3[ . This 
problem of scalability has motivated the development of 
rule-based modeling languages, such as the BioNetGen 
language (BNGL) |3, H and Kappa @, Q, which pro- 
vide a rich yet concise description of signaling proteins 
and their interactions. The biochemical state-space ex- 
plosion problem is avoided by representing interacting 
molecules as structured objects and using rules to en- 
code their interactions. In the graph-based formalisms of 
BNGL and Kappa, molecules are represented as graphs 
and biochemical interactions by graph-rewriting rules. A 
rule can define either a single reaction or a class of reac- 
tions that share a common set of transformations (e.g., 
formation of a bond between molecules) and system-state 
requirements (e.g., that one or more components have 
a particular covalent modification). Rules are local in 
the sense that only the properties of the reactants that 
are transformed or are required for the transformation to 
take place affect their ability to react. 

An important characteristic of rule-based models is 
that they can encode both finite and infinite reaction net- 
works. If the network is finite and "not too large" then 
the network can be generated algorithmically @, H, 13-Hoj 
and simulated using a variety of network-based simula- 
tion methods (e.g., Refs. [lll - fl3| ). However, the rule- 
based methodology also provides a formalism for sim- 
ulating models with prohibitively large or infinite state 
spaces. Such a "network- free" approach involves repre- 
senting molecular complexes as particles and applying 
rule transformations to those particles at runtime us- 
ing a kinetic Monte Carlo update scheme [3, [ll| . This 
avoids enumeration of combinatorially-large state spaces 
because only the current configuration of the system and 
its potential transformations are tracked. As a result, 
network-free methods can efficiently simulate com plex 
svstems beyond the reach of network-based methods fl4| — 
llq |. However, because every molecule in the system 
is instantiated, network-free methods may require large 
amounts of computational memory, a potential limiting 
factor for simulating lar ge sy stems such as the regulatory 
networks of a whole cell [171 . Ha ). A typical eukaryotic cell, 
for example, contains on the order of 10 3 -10 4 protein- 
coding genes, 10 4 -10 5 mRNA molecules, and 10 9 -10 10 
protein molecules [l9l |20| , along with much larger num- 
bers of metabolites, lipids, and other small molecules. 
Simulating a cell at this level of detail using network-free 
methods would be impractical. There is a need, there- 
fore, to develop approaches that reduce the memory re- 
quirements of network- free methods. 



Computational complexity 

A common measure of the computational cost of an 
algorithm is its complexity. In basic terms, complexity 
measures how the computational cost increases as an al- 
gorithm is applied to increasingly larger data sets plj . 
For the simulation methods considered in this paper, two 



types of complexity arc important: (i) time complexity; 
the number of computational steps required to complete 
an algorithm, and (ii) space complexity; the number of 
memory units consumed during the execution of an algo- 
rithm. As a simple example, consider the task of search- 
ing for an item in a list of size N. If the list is unsortcd 
and we search by moving through it sequentially, then 
the algorithm will take at most N computational steps, 
i.e., the time complexity of the algorithm is linear in the 
number of items N (double the size of the list, double 
the average search time). However, if the items are orga- 
nized in a tree structure called a binary heap [2l[ , then it 
is possible to find any item in log 2 N steps, i.e., the time 
complexity is logarithmic in N. Note that both methods 
have space complexities that are linear in N. 

Network-based exact-stochastic simulation methods, 
like Gillespie's stochastic simulation algorithm (SSA) 
[T2I I22I [23| . treat species as lumped variables with a 
counter (i.e., a population). They have a time complexity 
that is linear, and a space complexity that is constant, in 
the number of particles being simulated [HI, [H| . Their 
main shortcoming is that the time complexity is often 
strongly correlated with the number of reactions in the 
network [24|, [Hj] . (There is no easy generalization of the 
time complexity with network size; the concentrations, 
rate constants, and density of the update network all con- 
tribute to the computational cost.) Network-free meth- 
ods, on the other hand, have a time complexity that is 
largely constant in the number of reactions in the network 
(approximately linear in the number of rules) [16| . This 
explains their utility for models that encode very large or 
infinite networks. However, being exact-stochastic meth- 
ods themselves, their time complexity remains linear (or 
worse [LH [26|]) in the number of particles being simu- 
lated. Moreover, since they represent every molecule in a 
system as an individual particle, their space complexity 
is linear in the number of particles. Thus, the significant 
reduction in time complexity for network-free algorithms 
with respect to network size is gained at the expense of 
an increase in space complexity with respect to particle 
number. This limits the size of system that can be fea- 
sibly simulated using network-free methods. In Table Q] 
we summarize the time and space complexities of the 
SSA and network-free algorithms with respect to parti- 
cle number and network/rule set size. 



Combining network-based and network-free 
methodologies 

The key idea pursued in this work is that memory con- 
sumption can be reduced in network-free simulators by 
representing large numbers of identical molecular com- 
plexes as single variables with population counters rather 
than as particles. However, retaining the ability to 
address combinatorial complexity requires retaining the 
particle representation for complexes that are comprised 
of many molecules and/or have complex substructures. 



3 



TABLE I. Time and space complexities for the SSA 
and network-free (NF) simulation algorithms. Scaling 
is shown with respect to particle number, P, and number of 
reactions, R, or rules, R. Typically, 7?<C-R. Complexity with 
respect to R or R is generally difficult to quantify and depends 
on the method used for event selection (e.g., Refs. [l^. [24. [25l. 
[HiHH) and the density of the update graph, which is model 
specific. Ranges given are for best- and worst-case scenarios. 

Particles (P) Rxns (R) or Rules (R) 



SSA Time O(P) 

SSA Space 0(1) 

NF Time 0{P)-0{P 2 )^_ 

NF Space O(P) 



0(1)-0(R 2 ) 
0{R)-0(R 2 ) 

0(1)-0(R 2 ) 
0{R)-0(R 2 ) 



The upper end of this range is based on a worst case that occurs 
for polymerizing systems that exhibit a solution-gel phase 
transition [H,|2(| (see Fig.gfS). 



Here, we present an approach, termed the hybrid par- 
ticle/population (HPP) simulation method, that accom- 
plishes this. Given a user-defined set of species to treat as 
population variables, the HPP method partially expands 
the network around the population species and then sim- 
ulates the partially-expanded model using a population- 
adapted particle-based method. By treating complex 
species as structured particles, HPP capitalizes on the 
near-constant time complexity with respect to network 
size characteristic of the network-free approach. How- 
ever, for the subset of species treated as population vari- 
ables, we take advantage of the constant memory require- 
ments of the network-based methodology. The reduction 
in computational memory requirements offered by HPP 
is an important step towards the practical simulation of 
systems approaching the size and complexity of a whole 

ceii miii. 

The paper is organized as follows. First, we provide 
brief descriptions of the example systems and perfor- 
mance metrics that we use to quantify the utility of 
the new method. We then present the HPP simulation 
method, which is the main contribution of this paper. 
Results of various performance analyses follow, which 
demonstrate that significant reductions in memory usage 
are possible using the new approach with little effect on 
simulation run time. Finally, we conclude with a discus- 
sion of the practical consequences of our work and some 
possible enhancements to the method. 



METHODS 

Example models 

We have tested the performance of the HPP method 
by applying it to four example models, summarized in 



Table |H] and discussed in further detail below. All of 
the models are biologically relevant and are either taken 
directly from the literature or are based on models that 
are taken from the literature. Complete BNGL model 
files, as well as HPP (partially-expanded) versions, for 
all example models are provided as Texts S4-S12 of the 
supporting information. 



Trivalent-ligand bivalent-receptor 

The trivalent-ligand bivalent-receptor (TLBR) model 
is a simplified representation of receptor aggregation fol- 
lowing multivalent ligand binding. TLBR has biological 
relevance to antigen-antibody interaction at the cell sur- 
face, where bivalent IgE-FceRI receptor complexes ag- 
gregate in the presence of multivalent antigen (29|. A 
theoretical study of the TLBR system was presented by 
Goldstein and Perelson [29| , who derived analytical con- 
ditions for a solution-gel phase transition in terms of 
binding equilibrium constants, free ligand concentration, 
and receptors per cell. A more recent study considered 
the effects of steric constraints and ring closure on the 
solution-gel phase transition in the TLBR system j26j. 

Despite its simplicity, the TLBR system experiences a 
state-space explosion near the solution-gel phase bound- 
ary. A computational study by Sneddon et al. [lj| repro- 
duced the analytical results of Goldstein and Perelson us- 
ing the BNGL-compliant network-free simulator NFsim 
[l6| . Due to large excesses of ligand and receptor under 
certain conditions, TLBR is a natural test case for HPP. 
We simulated the TLBR system using HPP with free 
ligand and receptor treated as population species. All 
simulations were per formed with parameters as defined 
in Monine et al. |26|], which lie within the solution-gel 
coexistence region. A cell-scale simulation assumed 1 nl 
extracellular volume per cell (10 6 cells/ml) with 8.3 nM 
ligand and 3xl0 5 receptors per cell. Simulations were per- 
formed at fractional cell volumes, /, ranging from 0.001 
to 0.1 with a lumping rate constant k_lump = 10 000/s 
(see below). Results are shown in Fig. |4] 



Actin polymerization 

Actin polymerizati on p lays a key role in cell morphol- 
ogy and motility [H, H3] ■ Roland et al. [3(j presented a 
dynamic model of actin polymerization featuring filament 
elongation by monomer addition, stabilization by ATP 
hydrolysis, and severing mediated by actin depolymeriz- 
ing factor ( ADF) /cofilin. Sneddon et al. [l6[ presented 
a rule-based formulation of the Roland et al. model and 
replicated their results using NFsim. The model features 
an excess of actin monomer and ADF molecules. There- 
fore, we speculated that substantial memory reduction 
would be possible using the hybrid approach. We applied 
HPP to the Sneddon et al. rule-based model of actin dy- 
namics (hereafter referred to as the Actin model) with 



4 



TABLE II. Summary of example models used to test the performance of the HPP method. Number of particles is 
for an NFsim simulation of a full cell volume (/ = 1). Fractional cell volumes as low as 0.001 and as high as 1 are used in the 
performance analyses (see "Example models" for details). 



Model 


Rules Reactions 


Species 


Particles Population 
(/ = 1) species 


Rules after 
PNE 


t_end (s) 


TLBR [16, 26, 29] 


4 


00 


oo 


5.3 xlO 6 


2 


9 


500 


Actin [16, 301 


21 


00 


oc 


1.2 xlO 6 


2 


25 


1000 


FceRI [16, 31 , 32j 


24 


58 276 


3744 


6.9 xlO 6 


1 / 6 


25 / 38 


2400 


EGFR [33-35] 


113 


415 858 


18 950 


2.2 xlO 6 


29 


159 


1200 



actin monomer and ADF treated as population species. 
A cell-scale simulation assumed 1 pi intracellular vol- 
ume with 1 /zM actin monomer and 1 fiM ADF/cofilin. 
Simulations were performed at fractional cell volumes, 
/, ranging from 0.01 to 1 with a lumping rate constant 
k_lvrmp = 10 000/s. Results are shown in Fig. [5] 



FceRI signaling 

FceRI is a membrane receptor that binds IgE antibod- 
ies. Signaling through FceRI regulates basophilic his- 
tamine release in response to IgE antibody-antigen in- 
teraction [38|. Faeder et al. [HI, HH developed a rule- 
based model of FceRI receptor assembly and activation 
in which receptor dimerization/ clustering is mediated by 
chemically cross-linked IgE, which serve as multivalent 
ligands. Dimcrizcd receptors are also transphosphory- 
latcd, leading to Syk an d Ly n recruitment and phospho- 
rylation. Sneddon et al. [l6| presented several extensions 
of the Faeder et al. model, including the gamma2 variant 
with two 7 phosphorylation sites. Particle-based NFsim 
simulations of the gamma2 model were found to be sub- 
stantially faster than network-based SSA simulations. 

Due to the excess of free ligand, the HPP method was 
applied to the gamma2 model to reduce memory con- 
sumption. The method was applied with two different 
sets of population species. In the first case, only free lig- 
and was treated as a population species (FceRLl). In 
the second, cytosolic Lyn and all four phosphorylation 
states of cytosolic Syk were also treated as populations 
(FceRI:6). A cell-scale simulation assumed 1 pi intracel- 
lular volume with 1 nl extracellular space per cell (10 6 
cells/ml), 10 nM ligand, and 4 x 10 5 receptors per cell. 
Simulations were performed at fractional cell volumes, /, 
ranging from 0.001 to 0.1 with a lumping rate constant 
k_lump= 10 000/s. Results are shown in Fig. |6] 



EGFR signaling 

A model of signaling through the epidermal growth 
factor receptor (EGFR), beginning with ligand binding 



and concluding with nuclear phospho-ERK activity, was 
constructed by combining three existing models: (i) a 
rule-based model of EGFR complex assembly (ii) 
a Ras activation model [IH; (iii) a pathway model of 
Raf, MEK and ERK activation [35[. Ras activation 
was coupled to the EGFR complex assembly by treat- 
ing receptor-recruited Sos as the Ras GEF. Activated 
Ras was coupled to the Raf /MEK /ERK cascade through 
RasGTP-Raf binding and subsequent phosphorylation of 
Raf. Parameters for the combined model were obtained 
from the respective models. However, parameters gov- 
erning Ras-GEF (i.e., Sos) activity had to be changed 
from their original values [34| in order to account for the 
known GEF- mediated activation of Ras [i(| • Specifically, 
we used Am,gdp = ^j\/,gtp = 1-56x10" 7 M and D = 1000 
(unitless) . 

Free EGF and Raf-, MEK-, and ERK-based species 
were treated as population species in the hybrid variant. 
Ras-based species were also treated as populations ex- 
cept for those that include a Sos molecule. A cell-scale 
simulation assumed 0.94 pi cytosolic and 0.22 pi nuclear 
volume, with 0.94 pi extracellular space, 10 nM ligand, 
and 4xl0 5 receptors per cell. Simulations were performed 
at fractional cell volumes, /, ranging from 0.01 to 1 with 
a lumping rate constant k_lump = 100 000/s. Results are 
shown in Fig. [7] 



Performance metrics 

HPP was evaluated for peak memory use, CPU run 
time, and accuracy as compared to particle-based NFsim 
simulations. For models where network generation is pos- 
sible (FceRI and EGFR), comparisons were also made to 
SSA simulations (as implemented in BioNetGen Q). All 
benchmarks were performed on a 2 x Intel Xeon E5520 
@ 2.27 GHz (8 cores, 16 threads, x86_64 instruction set) 
with 74 GB of RAM running the GNU/Linux operating 
system. To ensure that each process had access to 100% 
of the compute cycles of a thread, no more than 12 sim- 
ulations were benchmarked simultaneously In all cases, 
reported results are based on > 7 independent simulation 
runs. 
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Peak memory 

Peak memory usage was evaluated by peak virtual 
memory allocation reported by the operating system with 
the command: cat /proc/<PID>/status. For all bench- 
mark models, peak memory was achieved early in the 
simulation and remained steady throughout (data not 
shown) . 

CPU run time 

CPU run time was evaluated using clock time as a 
metric. Clock time was recorded using the Time : :HiRes 
Perl module. Run time included initialization as well as 
the simulation phase. Partial network expansion for HPP 
simulations was a one time cost, typically a few seconds, 
and was not included in the benchmark. 



Accuracy 

Simulation accuracy was quantified using several ap- 
proaches. For all models, the total number of reaction 
firings was recorded for each simulation run (firings of 
population-mapping rules were subtracted from the to- 
tal in HPP simulations). Based on 40 independent sam- 
ples each of NFsim and HPP (and SSA for the FceRI 
model), we tested the null hypothesis that none of the 
methods produce a larger number of firings under the 
Mann- Whitney U test [4l|, [42| . Distributions were visu- 
alized as box plots showing minimum values, maximum 
values, and quartiles. 

For the TLBR and Actin models, equilibrium distri- 
butions for key observables were also compared. These 
include the number of receptor clusters in the TLBR 
model and the length of actin polymers in the Actin 
model. 10 000 samples were collected over 100000 sec- 
onds of simulated time. Distributions were compared by 
binning samples (20 bins) and performing a two-sample 
chi-squared test |43( . For the FceRI and EGFR mod- 
els, trajectories for key observables were collected from 
40 simulation runs each. Moving averages and 5-95% 
frequency envelopes were plotted for sampling windows 
of 10 s. Due to complications of autocorrelation, a sta- 
tistical test was not applied to the dynamic trajectory 
comparison. Instead, the results were plotted for inspec- 
tion by eye. 

Software 

HPP and NFsim simulations were run using NFsim 
version 1.11, which is available for download at 
|http : / / emonet . biolo gy . yale . edu/nf sim[ All 
simulations (SSA included) were invoked through 
BioNetGen version 2.2.2, which implements the 
hybrid model generator and is distributed with 



NFsim 1.11 (see Refs. @, G3] and Sees. S3.1.5 and 
S4.1 of Text SI for details on running simulations 
with BioNetGen). NFsim and BioNetGen source code 
are available at http://code.google.eom/p/nfsim 
and http://code.google.eom/p/bionetgen, re- 
spectively. The most recent implementation of 
the hybrid model generator can be found in 
the BioNetGen-2 . 2 . 3-testing distribution (sec 
generate_hybrid_model () in the BNGAction.pm 
module at http://code.google.eom/p/bionetgen/ 
source/browse/bng2/Perl2/BNGAction.pm). Addi- 
tional documentation for BioNetGen can be found at 
http : / /bionetgen . org 



RESULTS 

A hybrid particle/population simulation approach 

In this section, we first present an algorithm for trans- 
forming a rule-based model into a partial-network sys- 
tem. We then describe a population-adapted network- 
free simulation protocol that can be applied to that sys- 
tem. We refer to the combination of these methods as the 
hybrid particle/population (HPP) simulation method. 
The basic workflow is provided in Fig. [T] The transfor- 
mation method has been implemented within the open- 
source rule- based modeling package BioNetGen 0, [|| [l(| 
and the resulting hybrid model can be simulated us- 
ing version 1.11 (or later) of the network- free simulator 
NFsim (]~6l | . For convenience, we adhere in this paper to 
the BNGL syntax, which is summarized in Sec. S3.1 of 
Text SI of the supporting material. A complete descrip- 
tion of the syntax and semantics of BNGL can be found 
in Refs. 0, [HI EH- For the theoretical foundations of 
BioNetGen see Ref. @ and Sec. S-3.2 of Text SI. 



Population species and population-mapping rules 

Given a rule-based model, the first step in the HPP ap- 
proach is to select a subset of species to treat as "lumped" 
population variables. Currently, this is left as a task for 
the modeler (an optimal strategy for selecting species for 
population lumping is a topic for future research; see 
"Discussion"). Generally speaking, however, species that 
maintain a large population throughout the duration of 
the simulation are good candidates. It is also best to 
choose species that contain only a small number of com- 
ponents and/or participate in only a few reaction rules, 
since the time complexity of the simulation algorithm is 
strongly correlated with the size of the expanded rule 
set (Table [I]). A receptor with many phosphorylation and 
binding sites, for example, is usually best treated as a 
particle because treating it as a population species would 
require enumerating a large number of species and reac- 
tions through partial network expansion (see below). 
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FIG. 1. Basic workflow of the HPP simulation 
method. Given a rule-based model and a user-specified 
set of population-mapping rules (which define the population 
species), partial network expansion (PNE) is performed to 
generate a hybrid version of the original model. The hybrid 
model is then passed to the population-adapted version (1.11 
or later) of the network-free simulator NFsim, which gener- 
ates the time-evolution trajectories for all observable quanti- 
ties specified in the original model. 



Rule-based 
model 



Hybrid 
model 



Population- 
mapping rules 




Observable 
trajectories 



The next step is to map each (structured) species se- 
lected for lumping to an associated unstructured species, 
which wc refer to as a population species. Population 
species differ from standard BNGL species in that they 
possess a property, called a count, which records their 
current population. The mapping is accomplished us- 
ing a population-mapping rule, which follows the same 
syntactic conventions as a standard BNGL rule. For ex- 
ample, 



Egf(r) -> pop_Egf() k_lump 



maps the unbound EGF ligand Egf (r) to the unstruc- 
tured population species pop_Egf ( ) . The rate parameter 
here is termed the lumping rate constant. The basic idea 
is that during the course of a simulation complexes may 
dissociate and release instances of structured species that 
have been selected for lumping, Egf (r) in this case. The 
population-mapping rules gather these instances back up 
into the population species. The lumping rate constant 
thus determines how long these unlumped particles per- 
sist in the system. Its value should ideally be set to 
infinity as the HPP method is formally exact only for an 
infinite lumping rate (see below). However, if infinity is 
not an option in the network-free simulator being used 
(currently it is not in NFsim) then setting it to a value 
that is "large" with respect to the model dynamics will 
suffice (see Figs.HHH panels C and D). Note that the ac- 
tion of the above rule is to delete the Egf (r) molecule and 
increment by one the counter of the pop_Egf () molecule 
(only a single instance of a population species ever exists 
in the simulation environment). 



Partial network expansion 

The ultimate goal of the HPP method is to replace 
in the simulation environment large numbers of indistin- 
guishable particles with small numbers of lumped objects 
containing population counters (the population species), 
thus significantly reducing memory usage. In order to 
accomplish this without losing information regarding the 
dynamics of the system, we must partially expand the 
rule set until all reactions in which the population species 
appear as reactants are enumerated. We refer to this pro- 
cedure as partial network expansion (PNE). 

The PNE algorithm is comprised of three basic steps, 
applied to each reaction rule in a rule-based model: 

1. For each reactant pattern, identify all matches of 
the pattern into the (structured) species selected 
for lumping. Also collect a self-match of the reac- 
tant pattern unless it equals a species selected for 
lumping (see below). 

2. Derive an expanded set of rules by applying the 
original rule to all possible combinations (the carte- 
sian product) of the reactant pattern matches col- 
lected in step 1. 

3. For each derived rule, replace each reactant and 
product pattern that equals a species selected for 
lumping with its unstructured population counter- 
part. 

The result is an expanded rule set consisting of three gen- 
eral types of rules: (i) particle rules, comprised of reac- 
tant patterns that exclusively match particles; (ii) mixed 
particle/population rules, which operate on a mixture of 
particles and population species; (hi) pure population re- 
actions, which operate exclusively on population species. 
The expanded rule set has the property that every possi- 
ble action of the original rules on the population species 
is enumerated while actions on particle objects remain 
pattern-based (i.e., non-enumerated). A formal mathe- 
matical basis for the PNE algorithm, with pseudocode, 
is provided in Sec. S4.2 of Text SI. 

Note that the set of particle rules in the expanded rule 
set is just the set of reaction rules in the original model, 
less those that exclusively match structured species se- 
lected for lumping. The latter is a consequence of ex- 
cluding self-matches in Step 1 of the PNE algorithm for 
reactant patterns that equal structured species selected 
for lumping. This is an important point. The reason 
why this is done is that, for large k_lump, there will al- 
most never exist any instances of the structured species 
selected for lumping in the system. As mentioned above, 
the role of the population-mapping rules is to gather in- 
stances of structured species selected for lumping up into 
the population species. A large value of k_lump means 
that this will be accomplished almost instantaneously; 
an infinite value makes the method exact. As such, rules 
involving those reactant patterns need not be retained 
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since they will (essentially) never fire. Removing them 
decreases the size of the partially-expanded rule set and 
improves the efficiency of the method. Note, however, 
that for the sake of completeness we have implemented 
the alternative PNE method in BioNetGen (version 2.2.3 
or later) that retains all of the self-matches and, hence, 
gives exact results for any value of k_lump (see Sec. S4.1 
of Text SI for instructions on how to call this method). 
For a select number of examples, we have confirmed that 
both methods give statistically identical results for suffi- 
ciently large k_lump and that the alternative method is 
less efficient (data not shown). 



A simple example of PNE 

PNE is best illustrated through an example. In 
Fig. [21 we present a simple rule-based model of recep- 
tor activation (for brevity, parameters, initial popula- 
tions, and output observables arc omitted; sec Text S2 
of the supporting material for the complete model in 
BNGL format). The model includes a ligand, L, its 
cognate receptor, R, and three cytosolic proteins, A, B, 
and C, that are recruited to the phosphorylated recep- 
tor. The 16 rules (six unidirectional and five reversible), 
describing ligand-rcccptor binding, receptor phosphory- 
lation/dcphosphorylation, and protein recruitment, en- 
code a reaction network comprised of 56 species and 287 
reactions. In applying the HPP method, eight species 
are selected for lumping: free ligand, free A, B and C, and 
complexes of A, B and C that exclude the receptor. Re- 
ceptor complexes are treated as particles because there 
are many possible receptor configurations (48 total). 

In Fig. [3l a step-by-step application of PNE to rule 
Iff (forward) of Fig. [2] is presented. First, both reactant 
patterns are matched to the structured species selected 
for lumping. Reactant pattern 1 has one match, while 
reactant pattern 2 has two. Note that since neither reac- 
tant pattern exactly equals a species (i.e., is isomorphic 
to one) the self match (identity automorphism) is added 
to the reactant match list in both cases. This is required 
for generating mixed particle-population rules, where one 
reactant is an unstructured population species and a sec- 
ond is a structured particle. 

Next, the rule is applied to each possible reactant 
set (the cartesian product of the reactant match lists). 
This results in a set of six derived rules. The struc- 
tured species are then replaced in these rules by their 
associated unstructured population species, resulting in 
one pure particle rule (the original rule), three mixed 
particle/population rules, and two pure population reac- 
tions. Including the population-mapping rules, the hy- 
brid model contains a total of 42 rules, more than the 
original 16 but significantly fewer than the 287 reactions 
of the fully expanded network. The complete partially- 
expanded HPP model for this system can be found in 
Text S3 of the supporting material. 



Population- adapted network-free simulation 

Since an HPP model is properly a rule-based model, 
network-free simulation algorithms are readily adapted 
to the HPP method by the addition of: (i) a population 
count property for each molecule object; (ii) a transfor- 
mation that performs population increments and decre- 
ments; (iii) a method for calculating population- weighted 
propensities (rates). The population- weighted propen- 
sity of an expanded rule is given by the formula 

B "=-n - w 

S » r=l \x=l / 

where fc M is the rate constant, s M is the symmetry factor 
(see Note 4.21 of Ref. @), is the number of reactant 
patterns in the rule (i.e., the molecularity) , X is the total 
number of complexes in the system, p(x) is the popula- 
tion of complex x (unity in the case of particles), and 
f]n,r( x ) is the number of matches of reactant pattern r 
into complex x. 

Note that in the case of symmetric population reac- 
tions [e.g., pop_A() + popJVO -> A(a!0) .A(a!0)], the 
possibility of a null event must be calculated in order to 
prevent self reactions. This is accomplished by rejecting 
the event with probability l/p(x). Furthermore, since 
population species have zero components, if complex x is 
a population species and ^^^(x) = 1 (can only equal or 
1), then ri^ r (y) = for all y^x. This property is useful 
because it guarantees that a reactant pattern matches 
either particles or population species exclusively, never 
a mixture of both. Thus, once a rule has been selected 
to fire, the particles to participate in that rule can be 
selected from a uniform distribution rather than from a 
population- weighted distribution. 



Performance analyses 

Peak memory use and CPU run time 

Figures |4H71 panels A, show absolute and relative (with 
respect to NFsim) peak memory use as a function of cell 
fraction, /, for all models considered. We see that in all 
tested cases HPP requires less memory than NFsim. For 
NFsim, we see the expected linear relationship (Table HJ 
between peak memory use and system size (see below). 
For HPP, peak memory use also scales linearly but with 
a smaller slope. This is expected because as the system 
size increases a portion of the added particles, and hence 
memory cost, is always absorbed by the population sub- 
network. We also see that the relative memory benefits 
of HPP go to zero at the smallest cell fractions. This is 
because opportunities for compression decrease as species 
populations approach zero. In cases where network gen- 
eration is possible (FceRI, Fig. EK; EGFR, Fig. EK), 
we see for the SSA the expected constant relationship 
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FIG. 2. Simple receptor activation model in BNGL format. Abridged; see Text S2 of the supporting material for the 
complete model. 

Molecule. Type }'>> ». nyi 

1 L(r> extracellular ligand 

2 RC1 ,a~O~P,b"-0"^P) membrane receptor with two phosphorylation sites 

3 A(r,b~0~P) cytoeolic molecule {recruits to phosphorytated receptor) 

1 Btr , c) cytoaolic molecule {recruits to phaapho-rcceptor or phoepho-A] 



5 C(b> cyUwolic molecule {binds to B) 


Rcac ts on Rule 


Rate constant (a) 


Description 


I Ur> + RU) <-> L(r'l) .RCl'l) 


kpl.knl 


receptor-ligand hindiug 


2 L(r!l).R(l»l.a~0> -> L(r ! 1) .R(l«l,a~P) 


ka 


site a phosphorylation 


3 L(T!l>.R<l»l,b-0> -> Ur! 1) .RUU.b^P) 


.<2 


site b phosphorylation 


1 R(a--P> -> R<a^O) 


it 3 


site a dephoephorylation 


S RCb-P) -> R{b~0) 


A3 


M1«- b dcphii^j 'ln.'i \ Lit i< >ij 


G R(a~P) •» A(r) <-> R(a~P! 1) .A(r ! 1) 


kp4.km4 


phoiiphory3.it fil R binding A 


7 R(b~F) + BCD <-> R(W!1) .B(r»l) 


kp5.km5 


phoaphorylated R binding B 


8 B(c) + C(b) <-> B(c!i) .GCbU) 


kp£ .kmr6 


B sind C binding 


?) R(a~P!l) .Atr»l,b~o) -> R{a-^P» 1) . A(r ! 1 ,b^.P) 


k7 


recruited A phosphorylation 


10 A(b^P) -> A{b^0) 


kS 


A depbosphorylatian 


11 A(b-P) + BCr) <-> A(b~P!l) .B(r!l) 


kp9.km9 


phoHphorylatcd A binding B 


Population-mapping rules: 






Structured apeciea 




Lumping role constant 


1 LCr> -> 


PlO 


k-lump 


2 A(r,b-0) -> 


P2() 


kJLump 


3 A(r ,b~P) -> 


P30 


k 1 M p 


1 A(r t b~P»l) .B(r!l t c) -> 


P4() 


k-lunp 


$ ACr f b-P»lJ .B(r!l,<:!2) .C(b!2) -> 


P&C) 


k-lump 


6 B(r,c) -> 


P6() 


k-lump 


7 B(r t cH).CCb'l) -> 


P7(> 


k l um p 


S C(b> -> 


P3C) 


k-lunp 



between memory usage and system size (Table HJ. The 
SSA also requires more memory than both NFsim and 
HPP for all cell fractions considered. This is due to the 
high memory cost of the dependency update graph [24[ 
(standard in SSA implementations), which scales with 
the product of the number of reactions in the network 
and the average connectivity per reaction. 

Note that, for NFsim, the second-to-last points in the 
absolute memory usage plots (panels A, left) actually lie 
slightly above the linear trend in all cases. This is due 



to the fact that memory is allocated in NFsim in log- 
arithmically growing intervals (a common approach in 
software design) up to a threshold, after which alloca- 
tion occurs in intervals of a constant size. Basically, at 
the second-to-largest volumes memory allocation is still 
in the logarithmic phase, where greater excesses of mem- 
ory are likely to be allocated (the largest volumes are 
in the constant phase). To address this, we have rerun 
simulations for all models at all cell fractions using a cus- 
tom compiled version of NFsim that allocates memory in 
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FIG. 3. Partial network expansion (PNE) applied to Rule llf of Fig. [2j See Text S3 of the supporting material for 
the complete, partially-expanded model. 

Reaction nde: A(b-P) + B(r) -> A{b-P! 1) .B(r! 1) kp9 



Description; "Bind component sites b~*P and r" 



] : 


Find Reactant Pattern (RP) Matches 






Mali In \ lit' 1 1 '■•,>[; iMti 






I 


A{b -P) identity automorphism 


1 B(r) identity automorphism 


j 


JKr.b^P) P3() 




3 ECr.c) P6<) 
3 B(r,c!l) .C(b«l) P70 


2: 


Rule Expansion: Apply rule to each reactant set in the cartesian product of reactant matches 


I 


A(b-P) + B(r> 


-> 


A(b-P'l) .B<r!l) kpy 


2 


A(b-P> + BCr.c) 


-> 


A(b--P'l) .B(r!l,c) kp9 


! 


A(b-vP) + B(r,c! 1) .C(b1 1) 


-> 


A(b-P'2).B<r!2,c!l).C(b!l) kp9 


1 


A(r,b--P) + BCr) 


-> 


A(r,b-P! 1) ,BCr! 1) kp9 




A(r F b-.P) + B(r.c) 


-> 


A(r,b-P! 1) .B(r! l.c) kp9 


Ii 


A(r,b- P) + BCr.c'l!) .CKb'l) 


-> 


A<r,b^P!2) .B(r!2,c!l) C(bll) kp9 




Rule Rewriting: Substitute structured specie* j^raphs with unstructured population species 


I 


A(b-P) + BCr) 


-> 


ACb~P'l) .B<r! l) kp9 


J 


A(b-P) + P6(> 


-> 


A(b-P»l).B(r!l,c) kp9 


3 


A(b-~P) + P7(> 


-> 


A(b-.P'2).B(r'2.c!l).C(b!l) kp9 


I 


P3Q + BCx) 


~> 


A(r,b--P! 1) .B(r! 1) kp9 


5 


P30 + Pfi() 


-> 


P4Q kp9 


6 


P3C) + P7Q 


-> 


PSO kp9 



smaller intervals. The data confirm that the memory 
scaling for NFsim with system size is in fact linear (data 
not shown). 

Figures [4H3 panels B, show absolute and relative (with 
respect to NFsim) CPU run times as a function of cell 
fraction. Generally speaking, HPP and NFsim run times 
are comparable in all cases, indicating that the reduc- 
tions in memory use seen in Figs. SHU panels A, are 
not achieved at the cost of increased run times. In fact, 
HPP is slightly faster than NFsim in most cases. This 
is because operations on population species (e.g., incre- 
ment/decrement) are less costly than the graph opera- 
tions applied to particles (e.g., subgraph matching). Also 
note in Fig. 0J3 the expected quadratic relationship be- 
tween run time and system size for the TLBR model 
(Table [p , which is due to the solution-gel phase transi- 
tion [TJlli]. In Figs. and[7B, we see that the SSA 
is slower than both NFsim and HPP for all cell fractions 
considered. The difference is most pronounced at small 



cell fractions and is much more significant for EGFR than 
for FceRI. This is as expected since previous work [l6| 
has shown that network-free methods significantly out- 
perform network-based stochastic methods when popu- 
lation levels are small and network sizes are large (the 
EGFR network is significantly larger than the FceRI net- 
work; Table [n}. 

Finally, we see in Fig. |6j3 that the CPU run time in- 
creases as we increase the number of species treated as 
populations in the FceRI model, even though the mem- 
ory usage remains constant (Fig.[TJJV). This is interesting 
because it suggests that the FceRI: 1 variant, with free 
ligand as the only population species, is near-optimally 
lumped for the cell fractions considered. Evidently, the 
population levels of the lumped cytosolic Lyn and Syk 
species in the FceRI:6 variant never get large enough to 
provide significant memory savings (any savings that are 
achieved are offset by the additional mixed rules and re- 
actions added to the rule set). 
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FIG. 4. HPP performance analysis for the TLBR 
model. (A) peak memory usage (left: absolute, right: rel- 
ative to NFsim); (B) CPU run time (left: absolute, right: 
relative to NFsim); (C) number of reaction events fired dur- 
ing a simulation (/ = 0.01); (D) equilibrium distribution of 
number of clusters (/ = 0.01). 



FIG. 5. HPP performance analysis for the actin poly- 
merization model. (A) peak memory usage (left: absolute, 
right: relative to NFsim); (B) CPU run time (left: absolute, 
right: relative to NFsim); (C) number of reaction events fired 
during a simulation (/ = 0.01); (D) equilibrium distribution 
of actin polymer lengths (/ = 0.01). 




Accuracy 

Figures HHTJ panels C, show distributions of the num- 
ber of reaction firings per simulation run for each of 
the simulation methods considered. For all models, the 
distributions, as illustrated by box plots, are similar 
for NFsim, HPP, and SSA (the latter for FceRI only; 
Fig. [Bp). The two-sided Mann- Whitney U test [H Hi] 
was unable to reject the null hypothesis at the 5% sig- 
nificance level for all models (TLBR: p = 0.25; Actin: 
p = 0.90; FceRI: p = 0.27; EGFR: p = 0.07), providing 
strong evidence that HPP produces statistically identi- 
cal numbers of reaction firings to both NFsim and SSA. 

Figures panels D, compare distributions obtained 
from NFsim and HPP simulations of all models. In 



Fig. |4)3, we show equilibrium distributions of the num- 
ber of receptor clusters in the TLBR model (/ = 0.01). 
In Fig. [5)1), equilibrium distributions of polymer lengths 
in the Actin model are shown (/ = 0.01). In both 
cases, we could not reject the null hypothesis of the 
equivalence of the NFsim and HPP distributions (TLBR: 
p = 0.50; Actin: p = 0.66). In Fig. |6jD, time courses 
for 7-phosphorylated receptor and receptor-recruited, a- 
phosphorylated Syk arc shown (/ = 0.01). In Fig. [7)3, 
time courses for membrane-recruited (active) SOS and 
nuclear phospho-ERK are presented (/ = 0.05). Although 
we did not perform any statistical tests, visual inspection 
of the trajectories shows that in all cases the NFsim and 
HPP results arc virtually identical. 
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FIG. 6. HPP performance analysis for the FceRI sig- 
naling model. (A) peak memory usage (left: absolute, right: 
relative to NFsim); (B) CPU run time (left: absolute, right: 
relative to NFsim); (C) number of reaction events fired dur- 
ing a simulation (/ = 0.01); (D) timecourses (means and 
5-95% envelopes; / = 0.01) for 7-phosphorylated receptor 
(top) and receptor-recruited, a-phosphorylated Syk (bottom). 
SSA timecourses are virtually indistinguishable and have been 
omitted for clarity. 



FIG. 7. HPP performance analysis for the EGFR sig- 
naling model. (A) peak memory usage (left: absolute, right: 
relative to NFsim); (B) CPU run time (left: absolute, right: 
relative to NFsim); (C) number of reaction events fired during 
a simulation (/ = 0.05); (D) timecourses (means and 5-95% 
envelopes; / = 0.05) for activated Sos (top) and phosphory- 
lated ERK (bottom). Due to high computational expense, 
SSA statistics were not collected in (C) and (D). 
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DISCUSSION 

We have presented a hybrid particle/population sim- 
ulation approach for rule-based models of biological sys- 
tems. The HPP approach is applied in two stages 
(Fig. [T]): (i) transformation of a rule-based model into 
an equivalent hybrid form by partially expanding the 
network around a selected set of population species; (ii) 
simulation of the transformed model using a population- 



adapted network-free simulator. The method is formally 
exact for an infinite population lumping rate constant, 
but can produce statistically exact results in practice pro- 
vided that a sufficiently large value is used (Figs. 0H3 
panels C and D). As currently implemented, the primary 
advantage of the HPP method is in reducing memory us- 
age during simulation (Figs.HHH panels A). Importantly, 
this is accomplished with little to no impact on simula- 
tion run time (Figs.0H71 panels B). 
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FIG. 8. Cost of running simulations on the Amazon 
Elastic Compute Cloud (EC2). The minimum cost as 
a function of memory requirement was calculated based on 
January 2012 pricing (http://aws.amazon.com/ec2/) of all 

Standard, High-CPU , and High-Memory EC2 instances (see 
Sec. SI of Text SI for details of the calculation) . Also included 
are values for NFsim, HPP, and SSA simulations of the EGFR 
model at cell fraction f=l. 




o.al 1 ■ ■ ■ 
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Memory requirement (GB/simulalior*) 



Monetary cost analysis 

In order to frame our results within a real-world con- 
text, we have estimated the cost of simulation based 
on hourly rates of on-demand instances on the Amazon 
Elastic Compute Cloud (EC2). In Fig. we show the 
hourly cost (per "effective compute unit" ) of simulation 
as a function of required memory per simulation (details 
of the calculation can be found in Sec. SI of Text SI). 
Also included in the plot are values for HPP (0.3 GB), 
NFsim (2.1 GB), and SSA (22.0 GB) simulations of the 
EGFR model at cell fraction f = l (Fig. UK)- 

The EC2 offers various "instance types" with different 
CPU and memory resources, including High-CPU and 
High-Memory varieties. Our calculations show that be- 
low 1.82 GB of required memory High-CPU instances 
are the most cost effective. Above this threshold High- 
Memory instances are the better option. We see that the 
HPP simulation falls below this cutoff while both NFsim 
and SSA lie above. There is a quantifiable benefit, there- 
fore, to reducing memory usage in this case. HPP sim- 
ulations on the EC2 would be ~2.5 and ~33 times less 
expensive than NFsim and SSA, respectively (HPP is 
slightly faster than NFsim and significantly faster than 
SSA; see Fig.EP). The whole-cell EGFR model thus pro- 
vides a tangible illustration of the benefits of reducing 
memory load for dynamic simulations of large biochemi- 
cal models, approaching the scale of a whole cell. 

The benefits of reducing memory usage with HPP 
are further emphasized by the fact that common model 



analysis techniques such as parameter estimation (e.g., 
Ref. [Ill), sensitivity analysis (e.g., Ref. [IB]), and statis- 
tical model checking (e.g., Ref. [471]). typically entail run- 
ning a large number (thousands or more) of independent 
simulation trials. These trials can be distributed across 
a computer cluster and run independently, which is effi- 
cient because there is little need for interprocess commu- 
nication. However, typical hardware configurations have 
a fixed number of processor cores with a large, common 
pool of memory. If each independent process consumes 
a large amount of memory, then it may be impossible to 
fully utilize each CPU core. Reducing memory require- 
ments can thus improve the effective use of cluster and 
cloud-based computing resources. 

Finally, realistic simulations of whole-cell dynamics re- 
quire accounting for the highly crowded and inhomoge- 
neous environment of the cell interior. A common ap- 
proach is to discretize a system into multiple well-mixed 
subvolumes, with reactions taking place within, and dif- 
fusion occurring between, each subvolume |H, |49[ . How- 
ever, doing so can significantly increase the number of 
objects (species, reactions, rules, etc.) that must be rep- 
resented and retained in memory during the course of a 
simulation. The ability to reduce memory usage is thus 
of particular value in the case of spatial simulations. We 
expect that HPP can be successfully applied to spatial 
models without modification and can provide several-fold 
reductions in memory use without substantial impact on 
run time. 



Future directions 

We have shown that peak memory use for HPP scales 
linearly with system size with a slope that is smaller than 
that for NFsim (Figs. HJ-[7l panels A). Moreover, when 
network generation is possible, HPP memory use at the 
system sizes considered is significantly less than for SSA, 
which has approximately constant scaling (Figs. [6]4_ and 
EJV). However, the linear scaling of HPP and constant 
scaling of SSA means that with further increases in the 
system size there will invariably come a point where HPP 
memory use exceeds that of SSA. Intuitively, we under- 
stand that this is because species that are rare at small 
volumes, and hence chosen to be treated as particles, be- 
come plentiful at large volumes. Since we require that 
population species be selected a priori, there is currently 
no way to address this problem other than through in- 
tervention by the user. 

We propose to develop an enhanced version of HPP 
that tracks the populations of particle species and auto- 
matically lumps them into population species when their 
number exceeds a certain threshold. PNE would be per- 
formed every time a new species is lumped, meaning that 
the memory load would never exceed that of the fully- 
expanded network. In Fig. [51 we provide a qualitative 
sketch of how the memory usage of such an "automated 
HPP" (aHPP) method would scale with system size for 
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FIG. 9. Memory use vs. simulated volume for different 
simulation methods, including a hypothetical "auto- 
mated HPP" (aHPP). For finite networks, aHPP memory 
use plateaus once the entire reaction network has been gen- 
erated. For infinite networks, the scaling at large volumes 
should fall somewhere between constant and linear (no worse 
than HPP) depending on the model (see Sec. S2 of Text SI 
for an analysis). 




t 1 r 



Volume 



erations in the population subnetwork will be limited. 
However, if those fast dynamics are due to the presence 
of a large number of identical particle species (i.e., sub- 
optimal lumping), then automated lumping via aHPP 
could alleviate the problem. This further emphasizes the 
value of the proposed aHPP method. Optimizing both 
memory usage and speed of simulation will not only de- 
crease the time to results but will impact, in a tangible 
way, the cost of doing computational research. 



SUPPORTING INFORMATION 

Text SI: Sec. SI : Details of the monetary cost 
analysis shown in Fig. [8] Sec. S2 : Computational 
complexity analysis for the hypothetical aHPP method 
(Fig. EJ; Sec. S3 : BioNetGen/BNGL basics and BNGL 
formalism; Sec. S4 : Generating and simulating HPP 
models using BioNctGcn/NFsim, formal foundation of 
the PNE algorithm. 

Text S2: Complete BNGL file for the simple receptor 
activation model of Fig. [5] 
(recAct_example . bngl) 



a model with a finite network. Included for comparison 
arc HPP, NFsim, and SSA scalings as well. For infinite 
networks (such as TLBR and Actin), we expect aHPP 
memory use to scale somewhere between constant and 
linear (no worse than HPP) at large volumes, depending 
on the model (see Sec. S2 of Text SI for an analysis). 

Furthermore, the cost analysis of Fig. [8] illustrates that 
if, in addition to reducing memory usage, simulation run 
times can be decreased, then even greater monetary ben- 
efits are possible. There has been much research into 
methods for accelerating stochastic simulations of chem- 
ical reaction networks. These include r leaping [23| . 
originally proposed by Gillespie I5QI and improved upon 
by Gillespie and others Q3, tHHl!. The HPP method 
provides a unique opportunity for the application of r- 
leaping because, unlike in pure particle-based methods, 
there exists a partial network of reactions that act on 
population species. We believe that it would be rela- 
tively straightforward to integrate a r-lcaping method 
into the HPP by applying it exclusively to the population 
subnetwork while retaining the network-free approach in 
the particle-based component. We have recently imple- 
mented a r-leaping variant in BioNetGen, known as the 
partitioned- leaping algorithm [l3| , and are actively work- 
ing on integrating it with the HPP. 

Importantly, the integrated r-leaping/HPP method 
will involve calculating different time steps for the par- 
ticle and population subcomponents of the system and 
setting the global time step, r, to the smaller of the two. 
As such, if the fastest dynamics in the model are con- 
tained within the particle-based component, then accel- 



Text S3: Partially-expanded (HPP) version of the 
simple receptor activation model of Fig. [5] generated 
using the method outlined in Fig. [3] 
(recAct_example_hpp . bngl) 

Text S4: BNGL file for the TLBR model, 
(tlbr .bngl) 

Text S5: HPP version of the TLBR model. 
(tlbrJipp . bngl) 

Text S6: BNGL file for the Actin model. 
(actin_simple .bngl) 

Text S7: HPP version of the Actin model. 
(actin_simple_hpp .bngl) 

Text S8: BNGL file for the FceRI model, 
(f cer i_gamma2 . bngl) 

Text S9: HPP version of the FceRI model with free 
ligand treated as the only population species, 
(f ceri_gamma2_hppl .bngl) 

Text S10: HPP version of the FceRI model with free 
ligand, cytosolic Lyn and all four phosphorylation states 
of cytosolic Syk treated as population species, 
(f cer i_gamma2_hpp6 . bngl) 

Text Sll: BNGL file for the EGFR model, 
(egf r_extended . bngl) 
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Text S12: HPP version of the EGFR model, 
(egf r_extended_hpp . bngl) 
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